VFILM:  A  Value  Function  Driven  Approach  to  Information  Lifecycle 

Management 


Jeffrey  Cleveland11,  Joseph  P.  Loyal la,  Jonathan  Webba,  James  Hannab,  Shane  Clark0 
aBBN  Technologies,  10  Moulton  Street,  Cambridge,  MA  02138 
bUS  Air  Force  Research  Laboratory,  525  Brooks  Rd,  Rome,  NY  13441 
cUniversity  of  Massachusetts,  140  Governors  Drive,  Amherst,  MA  01003 


ABSTRACT 

Information  Management  (IM)  services  need  lifecycle  management,  i.e.,  determining  how  long  persistent  information  is 
retained  locally  and  when  it  is  moved  to  accommodate  new  information.  This  is  important  when  bridging  IM  services 
from  enterprise  to  tactical  environments,  which  can  have  limited  onboard  storage  and  be  in  highly  dynamic  situations 
with  varying  information  needs.  In  this  paper,  we  describe  an  approach  to  Value  Function  based  Information  Lifecycle 
Management  (VFILM)  that  balances  the  value  of  existing  information  to  current  and  future  missions  with  constraints  on 
available  storage.  VFILM  operates  in  parallel  with  IM  services  in  dynamic  situations  where  missions  and  their  informa¬ 
tion  needs,  the  types  of  information  being  managed,  and  the  criticality  of  information  to  current  missions  and  operations 
are  changing.  In  contrast  to  current  solutions  that  simply  move  the  oldest  or  least  frequently  accessed  information  when 
space  is  needed,  VFILM  manages  information  lifecycle  based  on  a  combination  of  inputs  including  attributes  of  the  in¬ 
formation  (its  age,  size,  type,  and  other  observable  attributes),  ongoing  operations  and  missions,  and  the  relationships 
between  different  pieces  of  information.  VFILM  has  three  primary  innovative  features:  (1)  a  fuzzy  logic  function  that 
calculates  a  ordering  of  information  value  based  on  multiple  relative  valued  attributes;  (2)  mission/task  awareness  that 
considers  current  and  upcoming  missions  in  information  valuation  and  storage  requirements;  and  (3)  information  group¬ 
ing  that  treats  related  information  collectively.  This  paper  describes  the  VFILM  architecture,  a  VFILM  prototype  that 
works  with  Air  Force  Research  Laboratory  IM  services,  and  the  results  of  experiments  showing  VFILM's  effectiveness 
and  efficiency. 
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1.  INTRODUCTION 

Bridging  Information  Management  (IM)  services  from  enterprise  to  tactical  environments  requires  lifecycle  management 
to  determine  how  long  information  is  retained  in  limited  onboard  storage  and  when  it  is  moved  to  make  room  for  new 
information.  If  IM  systems,  such  as  the  USAF  Phoenix  IM  Services  [8],  do  not  provide  support  for  cleaning  up  informa¬ 
tion  repositories  that  are  reaching  their  saturation  point,  information  will  be  lost  (in  an  unmanaged  manner),  archive  op¬ 
erations  will  fail,  software  exceptions  will  be  thrown,  or  in  the  worst  case  the  IM  services  will  fail. 

Repositories  will  fill  up  and  are  likely  to  do  so  when  they  are  needed  the  most,  even  with  modern  disks  with  the  capacity 
of  many  Gigabytes  or  Terabytes.  Consider  that  during  Operation  Anaconda  in  March  2002,  U.S.  air  forces  flew  65  com¬ 
bat  sorties  per  day  [15].  Thirty  minutes  of  ISR  video  from  a  UAV  in  compressed  MPEG-2  format  requires  1.2  GB  of 
disk  space.  A  single  high  resolution  RGB  image  in  TIFF  format  (2248x2080  pixels)  such  as  might  be  used  for  battle 
damage  assessment  or  aimpoint  generation  requires  over  13  MB  of  space. 

An  Information  Lifecycle  Management  (ILM)  service  should  manage  how  and  what  gets  retained  in  each  storage  level, 
so  that 

•  Critical  information  urgent  to  ongoing  and  upcoming  missions  is  readily  accessible. 

•  High  speed,  high  cost  storage  is  used  for  information  that  is  most  critical  to  ongoing  and  upcoming  missions. 

•  Movement  of  information  is  based  on  a  decrease  in  value  to  ongoing  or  upcoming  missions  and  storage  space  being 
needed  for  higher  value,  more  critical  information. 

•  Support  for  information  repositories  and  IM  operations,  e.g.,  query  and  archive,  is  maintained. 
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We  have  developed  a  Value  Function  based  Informa¬ 
tion  Lifecycle  Management  (VFILM)  prototype  with 
three  novel  advances  over  current  ILM  and  HSM  sys¬ 
tems: 

1.  Mission  awareness  that  considers  current  and  up¬ 
coming  missions  in  information  valuation  and  sto¬ 
rage  requirements; 

2.  A  fuzzy  logic  function  that  calculates  a  partial  or¬ 
der  of  information  value  based  on  multiple  relative 
valued  attributes;  and 

3.  Information  grouping  that  treats  related  informa¬ 
tion  collectively. 
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Figure  1.  Example  levels  of  hierarchical  storage. 
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2.  RELATED  WORK 

Information  Lifecycle  Management  and  Hierarchical  Storage  Management  Existing  ILM  solutions  take  the  following 
forms  [3] [21]: 

•  Storage-centric  offerings,  i.e.,  multiple  storage  solutions  with  different  capacity  and  price  characteristics  and  soft¬ 
ware  and  consulting  to  use  it  (the  point  of  view  of  storage  vendors); 

•  Technologies  for  Hierarchical  Storage  Management  (HSM),  i.e.,  automatically  moving  information  between  storage 
levels  (the  point  of  view  of  some  software  vendors);  or 

•  Business  processes  pertaining  to  the  value,  retention,  and  management  of  information  (the  point  of  view  of  some 
services  companies). 

Many  current  ILM  and  HSM  technologies  are  variations  on  backup  and  retrieval  software.  They  only  work  on  files  or 
documents;  move  data  based  on  age  or  time/frequency  of  access;  and  are  only  triggered  by  storage  full  situations  or  by 
time. 

Whereas  much  focus  in  ILM  centers  around  the  HSM  part,  most  existing  HSM  offerings  are  mechanistic  in  nature,  pro¬ 
viding  information  movement  and  retrieval  based  on  file  systems  and  standardized  control  interfaces.  They  are  invoked, 
manually  or  automatically,  when  storage  space  gets  tight  or  based  exclusively  on  time. 

It  is  common  in  HSM  to  categorize  levels  of  storage  (level  0,  level  1,  level  2,  etc.)  based  on  their  relative  speed,  capaci¬ 
ty,  and  cost,  as  shown  in  Figure  1.  In  general,  lower  storage  levels  are  considered  higher  speed,  lower  latency,  more  cost¬ 
ly,  and  lower  capacity.  In  operational  terms,  level  0  is  the  most  accessible  to  ongoing  missions  (e.g.,  onboard  storage  on 
tactical  platforms),  where  levels  1  and  higher  increase  in  capacity  and  latency  to  access  information  by  edge  platforms. 

In  reality,  these  levels  and  the  media  that  occupy  them  are  not  a  total  order.  For  example,  the  capacity  of  modern  disk 
drives  exceed  the  capacity  of  a  single  optical  disk,  the  cost  of  tape  is  not  necessarily  less  than  that  of  optical  disks,  and 
the  capacity  of  multiple  optical  disks  and  multiple  tapes  is  comparable  (virtually  unlimited).  Furthermore,  some  of  the 
storage  media  are  not  very  relevant.  Generally,  it  is  not  relevant  to  consider  RAM  and  cache  when  discussing  HSM 
technology,  since  they  typically  fall  under  the  control  of  the  operating  system. 

Information  Management  Services .  IM  services  have  emerged  as  necessary  concepts  for  information  exchange  in  net- 
centric  operations  [10]  [19] [20],  including  Net-Centric  Enterprise  Systems  (NCES)  [5],  which  provide  a  set  of  services 
that  enable  access  to  and  use  of  the  Global  Information  Grid  (GIG)  [6].  The  core  concept  of  IM  is  an  active  information 
management  model  where  clients  are  information  publishers  and  consumers  communicate  anonymously  with  other 
clients  via  shared  IM  services,  such  as  publication,  discovery,  brokering,  archiving,  and  querying  [4],  [8].  Information  is 
published  in  the  form  of  typed  information  objects  (IOs)  consisting  of  payload  and  indexable  metadata  describing  the  ob¬ 
ject  and  its  payload.  Consumers  make  requests  for  future  (subscription)  or  past  (query)  information  using  predicates, 
e.g.,  via  XPath  [24]  or  XQuery  [25],  over  10  types  and  metadata  values. 
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Common  IM  services  include  bro¬ 
kering  (i.e.,  matching  future  pub¬ 
lished  IOs  to  subscriptions),  archiv¬ 
ing  of  IOs,  querying  for  archived 
objects,  and  dissemination  of  IOs  to 
subscribing  and  querying  clients,  as 
shown  in  Figure  2.  This  approach  is 
similar  in  concept  to  other  publish- 
subscribe  middleware,  such  as  the 
Java  Message  Service  (JMS)  [10], 

Data  Distribution  Service  (DDS) 

[19] ,  CORBA  Notification  Service 

[20] ,  or  Web  Services  Notification 
(WS-N)  [18]. 

In  contrast  to  most  of  the  solutions 
offered  today,  the  military  needs 
ILM  solutions  that  are  mission- 
driven,  not  simply  triggered  by  a 
lack  of  available  space  or  time,  are 
mission-aware ,  not  simply  moving 

the  oldest  or  least  recently  used  data,  and  work  with  a  variety  of  structured  information  objects ,  not  just  files  or  opaque 
documents. 


Query  clients 

Figure  2.  Core  Information  Management  Services 


3.  A  FUZZY  LOGIC  APPROACH  TO  INFORMATION  VALUATION 

One  of  the  key  challenges  for  ILM  is  determining  when  an  IO’s  value  is  sufficiently  depreciated  to  warrant  moving  it 
from  level  0  store,  in  favor  of  another  10  to  occupy  the  same  space  (because  its  value  to  ongoing  operations  is  greater)  or 
in  favor  of  maintaining  the  space  free  for  occupation  by  future  IOs  (because  the  potential  value  of  the  future  IOs  to  on¬ 
going  operations  is  greater). 

Whether  information  should  be  moved  out  of  level  0  store  comes  down  to  a  difficult  to  quantify  predictive  measure,  i.e., 
whether  the  information  will  be  needed  soon  (or  ever).  Furthermore,  it  is  a  relative  assessment,  i.e.,  whether  the  space  in 
level  0  store  occupied  by  an  10  X  is  best  used  for  A  or  a  different  10,  or  left  available  for  future  information.  Further¬ 
more,  there  can  be  multiple  factors  that  go  into  deciding  the  relative  worth  of  information  objects,  including  the  missions 
or  operations  that  they  are  being  used  in  (indicating  how  relevant  they  are  to  ongoing  operations),  the  age  of  the  IOs  (in¬ 
dicating  how  fresh  the  information  is),  and  the  size  of  the  IOs  (indicating  how  much  space  they  are  using).  Each  of  these 
factors  has  relative  interpretation.  That  is,  whether  an  item  of  information  is  relevant  enough  to  keep,  or  large  or  old 
enough  to  move  is  relative  to  other  items  of  information  and  to  the  anticipated  use  of  the  space  if  the  information  object 
is  moved.  Any  discrete  or  static  threshold  for  any  of  these  factors  will  lead  to  inflexibility,  i.e.,  it  is  likely  to  only  be  suit¬ 
able  in  specific  situations  and  not  sufficient  in  others,  and  potential  thrashing. 

Therefore,  we  developed  a  Value  Depreciation  Function  (VDF)  that  builds  a  partial  order  of  information  valuation  so 
that  at  any  time  when  information  needs  to  be  moved  to  make  room  in  level  0,  the  information  that  is  most  depreciated 
in  value  relative  to  the  others  will  be  moved.  We  use  a  fuzzy  logic  rule  based  approach  to  produce  the  partial  order  from 
relative  valued  inputs  because  of  the  following: 

•  Whether  an  10  has  sufficiently  depreciated  in  value  to  move  or  not  is  not  completely  true  nor  completely  false.  In¬ 
stead,  it  is  more  or  less  true  or  false  depending  on  the  other  choices  available,  such  as  how  bad  the  space  is  needed, 
what  else  there  is  to  move,  and  what  the  information  will  be  used  for. 

•  The  factors  that  go  into  determining  an  item  of  information’s  valuation  lend  themselves  to  relative  interpretation. 
For  example,  whether  an  10  is  old  depends  on  the  IO’s  age  relative  to  that  of  other  IOs.  Whether  moving  an  10  will 
free  up  much  space  depends  on  the  IO’s  size  relative  to  that  of  other  IOs. 

Fuzzy  sets  capture  fuzzy,  relative  valued  memberships  better  than  traditional  sets  [26].  Fuzzy  logic  is  a  technique  for 
making  decisions  based  on  combining  the  members  of  multiple  fuzzy  sets  [9].  Traditional  sets  typically  are  described 
using  a  binary  membership  function,  m ,  where  a  set  S  =  {x  \  m(x)  =  1},  i.e.,  m(x)  =  1  means  that  x  is  a  member  of  the  set 


and  m(x)  =  0  means  that  x  is  not  a  member  of  the  set. 
Alternatively,  a  traditional  set  S  is  a  pair,  i.e.,  S=(U,  m ), 
where  U  is  the  universe  over  which  the  set  S  can  exist, 
and  the  function  m  determines  the  membership  of  any 
element,  ssU,  in  S.  If  m(s)=l ,  then  seS.  If  m(s)=0 ,  then 
s  0S.  As  an  example,  consider  the  set  of  all  IOs  in  a  repo¬ 
sitory.  The  set  BFT  can  be  defined  as  the  traditional  set 
of  all  IOs  that  have  the  type,  BlueForceTrack.  That  is,  for 
a  repository  of  IOs,  R ,  BFT  =  (R,  f(i)=(type(i)  ==  Blu¬ 
eForceTrack)). 

In  contrast,  consider  defining  all  the  sets  of  large  IOs  or 
old  IOs.  Although  the  size  and  age  of 
each  10  is  quantitative,  the  judgment  of 
whether  something  is  large  or  old  is  a 
relative,  fuzzy  concept.  These  are  not  as 
well  described  by  traditional  sets,  because 
of  the  traditional  sets’  binary  notion.  For 
example,  assume  that  the  large  set  is  de¬ 
fined  as  a  traditional  set  over  the  universe 
of  all  IOs,  with  a  membership  function 
f(i)  =  (size(i)  >  100).  An  10  of  size  101 
would  be  in  the  set  large ,  as  would  an  10 
of  size  1000,  and  an  10  of  size  1,000,000. 

All  of  these  IOs  would  have  the  same 
membership  in  the  set  large ,  despite  the 
orders  of  magnitude  differences  in  their 
sizes.  Conversely,  an  IO  of  size  99  would 
not  be  in  the  set  large ,  despite  being 
much  closer  to  the  IO  of  size  101  than  the 
other  elements  in  the  large  set. 


Figure  3.  Whereas  traditional  sets  have  binary  member¬ 
ship  functions,  fuzzy  sets  have  degrees  of  membership. 
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Figure  4.  The  combination  of  fuzzy  input  sets  into  relative  mem¬ 
bership  in  a  Move  set  using  fuzzy  logic  rules. 


A  fuzzy  set  is  defined  as  a  pair  F=(U,  m)  like  the  traditional  set,  but  the  function  m  in  a  fuzzy  set  has  a  range  in  the  in¬ 
terval  [0,1],  as  shown  in  Figure  3.  An  element,  s&U,  such  that  m(s)=0  still  means  that  s  is  not  a  member  of  the  set,  i.e., 
s0S.  Any  non-zero  value  for  m(s)  indicates  the  degree  to  which  s  is  a  member  of  the  set,  with  m(s)=l  meaning  that  s  is 
fully  in  the  set. 


In  our  size  example  above,  the  IOs  of  size  101,  1000,  and  1,000,000  would  each  have  a  membership  degree  >  0  and  <  1, 
and  the  membership  degree  of  the  IO  of  size  1 ,000,000  would  be  larger  than  that  of  the  IO  of  size  1 000,  which  in  turn 
would  be  larger  than  that  of  the  IO  of  size  101.  In  this  way,  a  fuzzy  set,  F=(U,  m),  provides  a  partial  order  over  its  mem¬ 
bers. 


Fuzzy  logic  is  a  technique  for  making  decisions  based  on  combining  the  members  of  multiple  fuzzy  sets  to  produce  a 
degree  of  membership  in  an  output  set,  as  shown  in  Figure  4.  It  consists  of  the  following  three  steps  [22]: 

•  Acquiring  a  number  of  input  values. 

•  Processing  the  inputs  according  to  a  set  of  fuzzy  logic  rules. 

•  Averaging  and  weighting  the  outputs  of  all  the  individual  rules  into  a  single  output  decision. 

Fuzzy  logic  has  been  used  in  various  applications.  The  subway  system  in  Sendai,  Japan  uses  a  fuzzy  logic  controller  to 
control  the  subway  train’s  acceleration,  slowing,  and  braking  to  ensure  a  smoother  ride  than  position  based  controllers 
[11].  Fuzzy  logic  has  also  been  used  in  air  conditioning  and  heating  system  controllers  [1],  rice  cookers  [23],  industrial 
automation  [7],  3D  Animation  software  [16],  and  elevator  controls  [11],  [17].  The  International  Electrotechnical  Com- 
mision  (IEC)  standardized  the  Fuzzy  Control  Language,  FCL ,  in  IEC  61 131-7  in  1997  [12]. 

We  implemented  the  Value  Depreciation  Function  using  jFuzzyLogic,  an  open  source  Java  implementation  of  Fuzzy 
Control  Logic  [13].  VFILM’s  VDF  function  uses  fuzzy  input  sets  including  the  relevance  to  current,  future,  or  past  mis- 


sions,  the  size  of  an  10,  and  the  age  of  an  10,  shown  in 
Figure  5  with  their  jFuzzyLogic  descriptions.  The  x 
axis  represents  the  measured  value  of  the  input,  e.g., 
the  age  of  an  10  calculated  from  the  current  time  and  a 
creation  timestamp,  and  the  y  axis  represents  the  degree 
of  membership  in  a  particular  fuzzy  set. 

A  set  of  fuzzy  logic  rules  combines  the  input  values  for 
a  particular  10  into  a  degree  of  membership  in  a  Move 
set.  The  higher  the  membership,  the  more  devalued  the 
worth  of  the  information  is,  and  the  more  likely  it  is  to 
be  moved.  Figure  6  shows  a  set  of  rules  we  used  in  our 
VFILM  prototype.  These  rules  combine  the  missionSta- 
tus  (i.e.,  relevance),  age ,  and  ioSize  membership  values 
into  a  membership  degree  in  the  move  output  set,  with 
missionStatus  weighted  higher  (weight  of  1)  than  age 
(weight  of  0.5)  and  ioSize  (weight  of  0.25). 

The  move  set  represents  the  partial  order  of  information 
(de)valuation,  i.e.,  those  IOs  with  the  higher  level  of 
membership  in  move  are  those  that  should  be  moved 
first  when  space  is  needed,  i.e.,  they  are  the  least  rele¬ 
vant  to  ongoing  missions,  the  oldest,  and  will  free  up 
the  most  space  (i.e.,  they  are  the  largest). 

This  design  offers  a  tremendous  amount  of  flexibility 
and  extensibility  in  the  VDF.  New  factors  can  be  added 
to  the  valuation  by  introducing  a  new  fuzzy  input  set. 
The  rules  for  combining  the  fuzzy  inputs  into  the  out¬ 
put  measure  can  be  extended.  Finally  the  various  input 
factors  can  be  weighted  so  that  some  factors  contribute 
more  to  the  output  set  than  others. 

The  VDF  component  of  the  ILM  consists  of  the  follow¬ 
ing  pieces: 

•  Fuzzy  sets  representing  the  inputs  and  output  of 
the  VDF  function. 

•  Fuzzy  logic  rules  that  combine  the  inputs  into  a 


RULE 

1  : 

:  IF  missionStatus  IS  active 

THEN  move  IS  unlikely 

RULE 

2  : 

:  IF  missionStatus  IS  NOT  active 

THEN  move  IS  likely 

RULE 

3  : 

:  IF  age  IS  young 

THEN  move  IS  unlikely  WITH  0.5; 

RULE 

4  : 

:  IF  age  IS  NOT  young 

THEN  move  IS  likely  WITH  0.5; 

RULE 

5  : 

:  IF  ioSize  IS  small 

THEN  move  IS  unlikely  WITH  0.25; 

RULE 

6  : 

:  IF  ioSize  IS  large 

THEN  move  IS  likely  WITH  0.25; 

Figure  6.  Rules  for  combining  the  fuzzy  inputs 
into  a  move  valuation. 


Figure  5.  Sample  fuzzy  input  sets  for  the  Value 
Depreciation  Function. 


degree  of  membership  in  the  output  set. 

Functions  that  access  the  values  for  the  fuzzy  inputs,  which  can  be  stored  in  information  metadata,  Phoenix  Context 
objects,  system  condition  monitors,  operating  system  attributes,  etc. 


4.  THE  VFILM  INFORMATION  LIFECYCLE  MANAGEMENT  SERVICE 

We  incorporated  the  information  valuation  function,  VDF,  into  an  ILM  service  that  provides  the  following  advantages 
over  existing  ILM  and  HSM  approaches: 


•  Multiple,  non-traditional  factors  can  be  used  in  the  VDF  to  value  information,  including  but  not  limited  to  mission 
factors,  relation  to  other  information,  and  information  characteristics,  and  the  set  of  factors  is  extensible. 

•  The  valuation  and  movement  of  information  can  be  triggered  by  mission,  system,  policy,  and  other  relevant  events 
through  an  easily  extendable  event-handler  implementation. 

•  The  information  valuation  and  movement  functions  are  separated,  so  that  either  can  be  scheduled  when  needed  or 
when  resources  are  available,  and  can  be  executed  as  much  as  needed,  e.g.,  to  recover  just  enough  space  to  continue. 

•  It  avoids  thrashing  around  fixed  storage  thresholds  or  drastic  purges  of  information  to  free  up  storage.  VFILM  treats 
information  valuation  as  a  partial  order  of  the  “criticality”  of  information,  so  as  much  or  little  can  be  moved  as 
needed. 

•  It  can  treat  groups  of  information  that  are  related  collectively,  so  that  they  are  valued  and  moved  as  a  group,  when 
appropriate. 

•  It  provides  a  rich  framework  for  specifying  factors  and  policies  for  valuing  and  moving  information.  New  FCL 
rules,  fuzzy  sets,  policies,  groups,  and  thresholds  are  readily  added  or  changed. 

The  VFILM  prototype  ILM  service  is  shown  in  Figure  7  and  consists  of  the  following  components: 


4.1 


The  ILM  Event  Manager  manages  event  handlers  that  handle  events  that  can  trigger  information  valuation  and 
movement. 

The  ILM  Controller  drives  the  behavior  of  the  ILM  in  response  to  events. 

The  Value  Depreciation  Function  implements  the  valuation  algorithm  described  in  Section  3. 

The  Group  Manager  maintains  the  definitions  of  groups  of  information. 

The  ILM-HSM  Adapter  abstracts  away  the  specifics  of  the  HSM  and  Repositories  being  used.  In  the  current  proto¬ 
type,  the  ILM-HSM  Adapter  implements  repre¬ 
sentative  HSM  functionality. 

The  ILM  Event  Manager  and  Controller 


As  shown  in  Figure  8,  the  ILM  Event  Manager 
maintains  a  set  of  Event  Handlers,  each  of  which 
receives  incoming  higher  level  events  and  maps 
them  to  ILM  events.  The  ILM  Controller  receives 
ILM  Events  from  the  Event  Handlers  and  invokes 
valuation,  movement,  or  update  functions. 

Generated  events  and  discrete  epochs,  such  as  mis¬ 
sion  events  and  policy  events,  are  delivered  using 
an  event  channel.  The  consumer  of  the  events  is  the 
Event  Manager  that  selects  the  appropriate  event 
handler  to  use  for  each  event,  based  on  the  event 
type.  Continuous  conditions,  such  as  the  amount  of 
free  storage,  can  be  monitored  directly  by  event 
handlers. 

Each  event  handler  maps  the  incoming  or  monitored 
higher-level  events  to  a  set  of  ILM  events,  and  the 
set  of  ILM  events  is  passed  to  the  ILM  controller 
for  execution.  The  ILM  events  serve  as  the  “lan¬ 
guage”  of  the  ILM  and  trigger  the  ILM  to  conduct 
information  valuation,  information  movement, 


Figure  7.  Design  of  the  ILM  Service. 


group  updates,  and/or  policy  modification.  We  pro¬ 
totyped  the  following  set  of  ILM  events: 

•  NeedSpace  -  Indicates  that  a  particular  amount 
of  space  should  be  made  available  by  moving 
information. 

•  Cleanup  -  Check  the  relative  valuation  of  in¬ 
formation  across  the  hierarchical  levels  and  re¬ 
balance  the  location  of  information,  so  that  the 
most  critical  information  (lowest  depreciation 
valuation)  is  in  level  0  store  and  less  critical  in¬ 
formation  (highest  depreciation  valuation)  is  in 
higher  storage  levels. 

•  UpdateThreshold  -  Change  the  threshold  of 
available  space  that  the  ILM  should  maintain  in 
level  0  storage. 

•  Valuation  -  Execute  the  VDF  valuation  func¬ 
tion  on  a  set  of  IOs  (provided  as  a  parameter). 

•  GroupUpdate  -  Create  a  new  group  or  change 
the  attributes  of  a  group  of  IOs. 

•  RuleChange  -  Add  or  change  a  fuzzy  logic  rule 
determining  the  valuation  of  IOs. 

•  MovelOs  -  Move  a  set  of  IOs  from  one  reposi¬ 
tory  to  another  (usually  in  different  storage  le¬ 
vels). 

Event  Handlers  are  pluggable  and  the  VFILM  pro¬ 
totype  provides  the  following  set: 

•  Mission  Event  Handler  -  Reacts  to  incoming  Mission  Events. 

•  File  System  Monitor  -  Monitors  the  level  of  free  space  and  triggers  a  Need  Space  event  when  the  available  space 
drops  below  a  specified  threshold. 

•  Group  Policy  Handler  -  Listens  for  events  associated  with  policy  governing  groups  of  information. 

•  Admin  Policy  Handler  -  Provides  ILM  administration,  such  as  setting  the  free  space  threshold  or  inserting  new 
Event  Handlers. 

We  defined  the  following  Mission  Event  Types  for  the  VFILM  prototype: 

•  MissionPrep  -  Indicating  that  a  planned  mission  will  start  sometime  in  the  future  and  the  ILM  should  prepare  for  it. 

•  MissionBegin  -  Indicating  the  start  of  a  mission. 

•  MissionEnd  -  Indicating  the  end  of  a  mission. 

The  Mission  Event  Handler  maps  these  three  Mission  Event  Types  to  the  ILM  Events  and  operations  indicated  in  Table 

1. 


Table  1.  Mission  Events  and  mapping  to  ILM  Events  representing  the  prototype  VFILM  Mission  Domain  Model. 


Mission  Event  Type 

ILM  Events 

Resulting  ILM  operations  invoked  by  the  ILM  Controller 

MissionPrep 

Cleanup 

Runs  the  valuation  function  on  multiple  storage  levels  and  sorts  the  results 
so  that  the  IOs  are  balanced  across  the  storage  levels  according  to  their  val¬ 
uation  and  the  storage  thresholds. 

MissionBegin 

GroupUpdate 

Creates  a  group  representing  the  mission. 

Valuation 

Triggers  valuation  of  all  IOs  matching  the  mission  predicate. 

NeedSpace 

Moves  IOs  to  free  up  enough  available  space  for  the  mission. 

MissionEnd 

GroupUpdate 

Removes  the  group  associated  with  the  mission. 

Valuation 

Triggers  valuation  of  all  IOs  associated  with  the  mission,  i.e.,  matching  the 
mission  predicate. 

System  events 
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Figure  8.  Design  of  the  ILM  Event  Manager. 


4.2  The  Group  Manager 

In  many  cases,  IOs  are  not  independent  entities  and  there  are  significant  ad¬ 
vantages  to  having  the  ILM  exploit  the  interdependencies.  One  realization  of 
information  interdependencies  is  association  with  a  common  group.  As 
shown  in  Figure  9,  information  in  a  system  can  be  associated  with  many 
overlapping  and  co-existing  groups,  based  on  shared  types  (e.g.,  blue  force 
tracks,  BFTs ),  source  (e.g.,  a  specific  platform),  role  (e.g.,  intelligence,  sur¬ 
veillance,  and  reconnaissance,  ISR ,  for  mission  A),  epoch  (e.g.,  a  sortie), 
location  within  a  particular  region,  and  so  forth.  VFILM  supports  the  asso¬ 
ciation  of  IOs  that  are  related  and  that  should  be  treated  collectively  into 
groups.  Events  can  affect  a  group  of  IOs  and  IOs  can  be  collectively  valued 
and  moved. 

Groups  are  defined  using  predicates  over  observable  attributes  of  IOs.  The 
Group  Manager  maintains  a  collection  of  Group  Contexts ,  which  hold  the 
following  information  about  a  group,  as  shown  in  Figure  10: 

•  Identifier  -  A  name  to  identify  the  group. 

•  Predicate  -  A  predicate  defining  the  IOs  in  the  group.  The  predicate  is 
defined  over  observable  attributes  of  an  10. 

•  Valuation  rules  -  The  VDF  rules  used  for  10  valuation  in  the  group. 

•  Precedence  -  Used  to  determine  which  group  definition  is  used  during 
10  valuation  when  an  10  is  part  of  multiple  groups. 

•  Stored  values  -  Input  values  associated  with  a  group  and  used  during  10 
valuation. 

Missions  are  represented  as  just  another  type  of  group.  When  a  Mission  Start 
event  occurs,  a  Group  Context  is  created  for  IOs  associated  with  the  mission. 


4.3  The  ILM-HSM  Adapter 

The  ILM-HSM  Adapter  is  a  control  interface  from  the  ILM  to  HSM  functio¬ 
nality  that  is  intended  to  support  a  variety  of  HSM  implementation  options. 

As  such,  it  provides  a  consistent  interface  for  the  ILM  to  specify  IOs  and 
files  that  should  be  moved  independently  of  the  specific  HSM  or  repository 
that  is  used.  In  a  situation  where  a  full  HSM  solution  is  not  appropriate,  the 
ILM-HSM  Adapter  can  be  responsible  for  the  movement  of  information.  If 
the  repository  stores  IOs  as  files  on  disk  this  could  involve  moving  the  file  and  updating  the  repository’s  reference  or 
leaving  a  symbolic  link  to  the  file’s  new  location.  With  other  repository  implementations  where  IOs  are  stored  in  rela¬ 
tional  databases,  this  could  involve  removing  the  10  from  one  table  and  inserting  it  into  another. 

In  situations  where  an  HSM  is  used,  the  ILM-HSM  adapter  serves  as  the  interface  to  the  HSM.  This  depends  on  the  spe¬ 
cifics  of  the  HSM  utilized  but  could  involve  assigning  priority  values  to  managed  files  based  upon  information  value  or 
modifying  a  management  policy. 

In  the  VFILM  prototype,  we  simulated  HSM  functionality  within  the  ILM-HSM  Adapter.  We  manage  a  repository 
where  metadata  is  stored  in  a  Berkeley  XML  DB  [2]  and  full  IOs  are  stored  as  individual  files  on  disk,  an  example 
layout  is  shown  in  Figure  11.  The  ILM-HSM  can  handle  multiple  repositories,  can  move  just  IOs  retaining  metadata  in 
the  level  0  store,  or  move  metadata  and  IOs  to  level  1  store. 

The  ILM-HSM  Adapter  maintains  the  following  two  extra  databases  on  each  level  of  storage  to  facilitate  information 
movement: 

•  A  Value  Store  -  Contains  the  10  context  ID,  10  value  (i.e.,  the  result  of  the  most  recent  valuation  execution),  and 
the  storage  level  of  the  10. 

•  An  ILM  Index  -  Maintains  an  index  of  the  IOs  in  the  level  of  storage  indexed  by  the  fields  used  to  define  group 
membership  allowing  rapid  lookup  of  IOs  associated  with  a  group. 


Figure  9.  Information  can  be  orga¬ 
nized  into  many  groupings,  some  of 
which  have  associated  lifecycles. 
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Figure  10.  The  Group  Context  con¬ 
tains  the  information  needed  to 
represent  a  group  of  IOs. 


The  ILM-HSM  Adapter  provides  multiple 
options  for  moving  IOs.  The  first  option 
moves  the  payload  of  IOs  only,  retaining  in¬ 
dexable  metadata  in  level  0  store.  This  enables 
all  queries  to  be  performed  on  the  repositories 
in  level  0,  although  retrieval  of  results  might 
need  to  reach  into  higher  levels.  Because  re¬ 
taining  all  of  the  metadata  in  level  0  store  can 
still  lead  to  the  level  0  store  filling  up,  the 
second  option  is  to  move  metadata  and  IOs 
together.  When  the  metadata  is  moved  out  of 
level  0,  the  query  service  must  be  aware  of  the 
hierarchical  storage  to  conduct  queries  on  the 
repositories  in  other  levels.  The  movement  of 
IOs  only  and  the  movement  of  metadata  and 
IOs  together  can  coexist. 
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5.  THE  ILM  PROTOTYPE  PER¬ 
FORMANCE 

Figure  12  shows  a  screenshot  of  the  VFILM 
prototype  in  action.  It  shows  the  IOs  in  the 
level  0  and  level  1  storage,  with  the  x  axis 
showing  the  10  valuation  result  (higher  values, 
i.e.,  IOs  farther  to  the  right,  represent  informa¬ 
tion  that  is  more  depreciated  in  value).  The  y 
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Figure  11.  Design  of  the  ILM-HSM  Adapter. 


axis  indicates  the  number  of  IOs  at  each  valuation  level.  The  red  IOs  are  in  a  repository  on  level  0  and  the  blue  IOs  are  in 
a  repository  on  level  1 .  In  this  screenshot,  the  ILM  has  perfectly  balanced  the  IOs,  with  all  the  IOs  in  level  1  storage  hav¬ 
ing  a  higher  depreciation  value  than  those  in  level  0  storage. 


Figure  13  shows  the  VFILM  prototype’s  ability  to  maintain  a  threshold  of  available  space  in  level  0.  The  x  axis  indicates 
the  passage  of  time  during  a  run  of  the  prototype.  The  red  line  indicates  the  amount  of  available  space  (y  axis)  at  any 
point  in  time.  When  the  available  space  reaches  the  lower  threshold,  it  triggers  a  move  event  until  the  available  space 
reaches  the  Stop  Move  Threshold.  The  vertical  lines  indicate  events,  such  as  mission  events,  that  affect  information  valu¬ 
ation  or  movement. 


The  VDF  scales  linearly  with  the  number  of  IOs  being  processed.  Figure  14  shows  the  result  of  an  experiment  in  which 
we  published  a  pre-defined  number  of  IOs  and  then  triggered  evaluation  over  the  entire  set.  We  ran  the  experiment  with 
two  different  10  sizes  (100  KB  and  1  MB)  in  five  different  repository  configurations,  containing  1MB,  25MB,  50MB, 
75MB,  and  100MB  total.  The  results  are  shown  in  Figure  14  with  a  linear  best  fit  line  for  each  and  the  fit  line’s  coeffi¬ 
cient  of  determination  (R2). 1 


For  the  smaller  IO  size  (100  KB), 
there  is  a  close  linear  fit  and  the  va¬ 
riance  is  low  for  all  set  sizes.  For  the 
larger  IO  size  (1  MB),  the  scaling  still 
appears  to  be  linear,  but  the  variance 
is  much  higher.  Figure  14  also  shows 
that  the  VDF  does  not  scale  with  the 
total  size  (in  MB)  of  the  evaluation 
set.  With  the  two  series  plotted  to¬ 
gether,  it  is  apparent  that  scaling  is 
dependent  upon  the  number  of  IOs 
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Figure  12.  IO  valuation  histogram 


1  The  coefficient  of  determination  is  a  measure  of  the  proportion  of  variability  in  a  data  set  that  is  accounted  for  by  the  fit 

line,  with  1  being  the  best  possible. 
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Figure  13.  VFILM  free  space  graph 


rather  than  the  data  size  of  the  evaluation  set.  With 
equivalent  data  sizes,  large  IOs  are  evaluated  much 
more  quickly  because  the  VDF’s  evaluation  func¬ 
tion  is  called  fewer  times. 

The  time  that  the  HSM  takes  to  move  IOs  depends 
on  both  the  number  and  the  size  of  IOs  being 
moved.  We  tested  this  with  two  different  10  sizes 
(100  KB  and  1  MB)  and  five  different  repository 
sizes  (containing  10,  250,  500,  750,  and  1000  IOs). 

The  results  for  both  10  sizes  appear  in  Figure  15. 

The  coefficient  of  determination  indicates  a  strong 
linear  correlation  in  both  cases.  Figure  15  also 
clearly  shows  that  the  two  10  sizes  do  not  scale 
according  to  the  same  linear  function.  It  takes  long¬ 
er  to  move  an  equivalent  number  of  large  IOs  be¬ 
cause  the  HSM  executes  the  same  number  of  move 
operations,  but  each  operation  requires  more  data  to 
be  moved  on  the  file  system.  Given  the  same 
amount  of  data  to  move,  large  IOs  are  moved  more 
quickly  because  the  HSM  needs  to  execute  fewer 
move  operations  in  order  to  free  the  same  amount  of 
space.  For  a  given  number  of  IOs,  it  takes  longer  to 
move  larger  IOs  due  to  reading  and  writing  the 
bytes.  For  m  IOs  each  of  size  s,  the  time  to  execute 
the  HSM  is  0(m*s) 

6.  CONCLUDING  REMARKS 

In  this  paper,  we  have  presented  the  design  and  pro¬ 
totype  that  provides  a  foundation  for  an  ILM  capa¬ 
bility  in  enterprise  and  tactical  environments.  The 
VFILM  prototype  includes  triggering  of  informa¬ 
tion  lifecycle  management  based  on  mission  events 
and  mission-based  policy,  valuation  of  information 
using  fuzzy  logic  algorithms  based  on  the  informa¬ 
tion’s  urgency  to  ongoing  mission  operations,  grouping  of  information  based  on  common  attributes  and  dependencies, 
and  migration  and  retrieval  of  information  objects  and  groups. 

The  major  contributions  of  this  paper  include: 


Figure  14.  The  VDF  scales  linearly  with  the  number  of 
IOs,  but  not  with  the  size  of  the  evaluation  set. 


•  A  prototype  ILM  service  that  provides  mission-aware  information  valuation,  control  of  HSM  movement  of  informa¬ 
tion  between  levels  of  storage,  and  support  for  IM  services  and  repositories. 

•  An  ILM-HSM  interface  that  abstracts  the  details  of  specific  HSMs,  file  systems,  and  repositories. 

•  A  novel  approach  to  information  valuation,  supporting  an  extensible  multi-factor  assessment  of  the  relative  values 
of  information  using  fuzzy  logic.  The  approach  produces  a  partial  order  of  information  depreciation,  handles  dy¬ 
namic  conditions  that  can  change  the  worth  of  information,  and  avoids  the  thrashing  that  is  possible  with  fixed  or 
static  valuation  thresholds. 

•  A  set  of  experimental  results  showing  the  performance  of  the  ILM  service. 

Further  research  building  upon  this  foundation  can  explore  additional  richness  in  the  VFILM  prototype,  e.g.,  to  expand 
its  mission  models  and  the  factors  utilized  in  valuation;  expanding  the  query  service  to  be  more  aware  of  the  hierarchical 
storage  levels  and  to  exploit  this  knowledge  to  order  query  responses;  and  to  explore  distributing  and  coordinating  ILMs 
for  improved  storage  and  access  to  critical  information. 
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Figure  15.  The  time  that  the  HSM  takes  to  move  IOs  depends  on  both  the  number  and  size  of  IOs. 
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